World model -> tries to make assumptions about the system (Prior)
Data driven -> observes the outcomes of a system given an input (black box approach)
Quantities:
- data/problem prior (input)
- labels prior (output)
- joint probability of inputs and outputs (world model)
Bayes:
- Probability of given data in the labels space
- likelihood function of labels in the data space
Approach 1: Null prior (ignore prior approximation)
Basically, what this tells us is that we can compute the likelihood function (output probability given by data inputs ) by the product of the world model function, under the assumption of target labels ( ) and the distribution of the labels ( ). The second function is assumed balanced or representative of the data or real world problem the data tries to model. This can mean that we may be have to use data balancing and other techniques. The first quantity, though, is the most important part, and that is how do we model the world.
Classical ML simply takes this problem of approximating and throws data at it by making some assumptions. The main assumption it does is that the world can be described as a complex Gaussian, thus the problem can be solved via maximum likelihood principle (or negative log likelihood as is done in practice due to tractability).
In other words: Throw as much data at me and assume the model can learn via maximum likelihood principle, ignoring the fact that we are modeling the world as a probability distribution. The questions that arises are: Is this enough? Isn't the world just a big probability distribution? Is it Gaussian?
Approach 2: Null data
This is classical mathematics and tries to model perfect system behaviour through physics. However, most real life problems are simply not understood very well or may not even have good enough physics-based approximations.
How do you define the world of
where X is the RGB space for some predefined heights and widths (for simplicity) and t is the targets space of Cats and Dogs ?
It's so much easier to solve this real world problems through data. The set of interesting problems solvable by this approach is limited by the simplicity of the model and our human understanding of the problem. However, there are advantages:
We can run simulations for some given input state and see the behaviour of the world after some time. This is the chaos theory domain, where there are some stable systems and some unstable systems, and there is little understanding behind the physics of the later ones. The first ones are also interesting. How do we know a system is stable? How do we guarantee that we model our real life problem using a stable system such that our outputs are explainable without chaos theory assumptions (i.e. they are deterministic)?
Approach 3: Hybrid model
Like everything in life, a approach that takes into account both sides (black box and perfect physics) is desired. In this case the problem becomes an iterative one: